Custom template for CAST
logo



Start with a short introduction about what is going on in this document

Introduction


Introduction to PDCA, CAST and the methods contained within this document go here.

Background


Explain some background info that is needed. Change header colours to purple?

Sample Data

Step through explanation of 2 stream sample data included. Plot both oil and gas production. Pick one well to apply across. Should this be before the arps parameters? Also show the uwi.matrix table where we summarize important parameters from each well

Summary Table

Key parameters for each production dataset have been summarized and compiled into a table. Cumulative months on production, initial oil and gas rates, peak oil and gas rates, peak month, and cumulative volumes can be seen for each UWI. These parameters will be used in subsequent calculations and help us prioritize wells as we begin to look at learning from wells with more production history later in PDCA process.


UWI OnProd Date Initial Oil Rate Peak Oil Rate Peak Month Cum Oil
100/05-17-081-16W6/00 2018-04-01 26.123677 328.6216 3 62958.48
100/11-17-081-16W6/00 2018-05-01 219.027416 544.5717 2 95700.09
100/12-17-081-16W6/00 2018-05-01 265.308244 538.2610 2 82928.63
102/05-17-081-16W6/00 2018-04-01 6.646233 406.3637 3 39977.40
102/12-17-081-16W6/00 2018-05-01 135.210625 439.8464 2 60875.30
103/05-17-081-16W6/00 2018-04-01 7.631636 378.4579 3 58147.41


Production Plots

Arps Decline models used to forecast historically and which models will be analyzed here

Arps is a common methodolgy for forecasting well production.

Modified Hyperbolic

Explanation of that equation and a definition using code..?

We’ll define the modified hyperbolic equation in executable form. The equation will accept decline parameters and a time series across which it can evaluate production rate. First we’ll need a function to convert secant decline percentage into nominal form.

PDCA

Explain the background of what it is and how it can be helpful in forecasting, especially to handle uncertainty. Define PDCA here! Explain briefly bayesian methods of achieving this? Priors and posteriors

Monte Carlo Method

Explain the place of the monte carlo method as our first try at estimating the posterior ditribution, showing parameter distributions as defining our prior and explaining how they are each sampled. Have inputs here for monte carlo, build any functions required, and run. Show nice clean histograms overlaid of the sampling for b, di, and qi based on the input criteria..

# Define distribution constraints for our Monte Carlo sampling of arps parameters
# Only oil values will be defined here.

## bi values
o.bi_min <- 0.5 #Initial oil b value min value
o.bi_max <- 2   #Initial oil b value max value

## di values; as decimals in secant. Will be converted to nominal in arps equation.
o.di_min <- 0.4   #Initial oil decline percentage min
o.di_max <- 0.98   #Initial oil decline percentage max

## qi multiplier or absolute values; Multipliers will be applied to peak intial rates.
o.qi_min <- 10    #Absolute min oil rate (bbls/d) for qi
o.qi_max <- 3     #Peak oil rate multiplier to define max oil qi (3 * peak_rate)

## dlim values; as decimals in secant.
o.dlim <- 0.07    #Limiting decline percentage used in modified hyperbolic equation

# Abandonment values; as rates in production units
o.abd <- 3
g.abd <- 10

# To handle multiple distribution types, let's define a "DefineDistribution" class
# The class will contain distribution information in a list, defined in S4
# Type = Normal, Lognormal, or Uniform; min_mean = min bound for uniform or mean of lognormal 
# or normal; max_sd = max bound for uniform or standard deviation for lognormal or normal

setClass("DefineDistribution",slots = list(type="character",min_mean="numeric",max_sd="numeric"))

# Using this class, we will define uniform distributions for all of our inputs
b <- new("DefineDistribution",type = "Uniform",min_mean = o.bi_min,max_sd = o.bi_max)
di <- new("DefineDistribution",type = "Uniform",min_mean = o.di_min,max_sd = o.di_max)
dlim <- new("DefineDistribution",type = "Uniform",min_mean = o.dlim,max_sd = o.dlim)

#Let's pick the first well 100/05-17-081-16W6/00 for a peak rate to show the distribution of qi
o.qpeak <- matrix[1,"Oil.Peak.Rate"]

#Then define it's distribution
qi <- new("DefineDistribution",type = "Uniform",min_mean = o.qi_min, max_sd = o.qpeak*o.qi_max)


Let’s visualize our defined distributions and overlay our samples from them. We’ll use a function named sample.any that is defined in the background to simply accept a DefineDistribution class object and randomly sample based on it’s type and inputs.

Uniform Distributions of b, di and qiUniform Distributions of b, di and qiUniform Distributions of b, di and qi

Uniform Distributions of b, di and qi

MCMC Method

Explain here and have links to jump around document? Or explain when we get to it?

CAST


Production Model Synthesis

Explain process of forecast sythesis here, and show how it is carried out for the modified hyperbolic. Show arps substitution, and explain

Combine Sampled Parameters

Let’s combine and organize our sampled parameters in a data frame. We’ll use an equation-specific function to ensure completeness and repeatability for multi-well matching.

qi b di dlim
149.55661 1.3771858 0.9056401 0.07
702.25675 1.0070127 0.8109728 0.07
81.13509 0.7138906 0.9575267 0.07
417.07264 0.5976827 0.7614572 0.07
869.65404 1.2382736 0.5138593 0.07
413.75675 1.7013493 0.6649539 0.07
431.16627 0.5109597 0.5670319 0.07
619.74954 1.2612917 0.9717101 0.07
406.22448 0.5904154 0.4300673 0.07
877.48734 0.6632781 0.4752926 0.07

First 10 rows of modified hyperbolic parameters dataframe

Historical Rate Synthesis

Using the combined parameter dataframe and the modified hyperbolic equation, we can define a function to use each row of the parameter dataframe as a Monte Carlo iteration and substitute it into the arps equation across the time interval of the current well. The function will create and return a mcIteration object for each iteration. This object is a list with slots for Iteration name, Parameters, and Synth. Some additional slots will be created and populated later as we approach cost and forecasting.

# A function, MH.synthesize, will be defined to carry out rate synthesis across the actuals 
# time interval.
MH.synthesize<-function(parameters,q.abd,t){
  #Parse up parameters and pass into MH function call
  fc <-  MH(parameters[["qi"]],parameters[["b"]],parameters[["di"]],parameters[["dlim"]],q.abd,t)
  
  #Simplify parameters into a single string representation
  parametersName<-paste(names(parameters),round(parameters,digits=2),sep = "=",collapse=" ")
  
  #Create mcIteration imitation list to improve performance and ease of combining data
  #Return mcIteration. Remaining list items to be appended by later functions
  return(list("Iteration" = parametersName,"Parameters" = parameters,"Synth"=fc))
}

# Define a time interval of actuals to synthesize across, using 100/05-17-081-16W6/00
uwi<-"100/05-17-081-16W6/00"

# Subset production including only 100/05-17-081-16W6/00
singleWell.all <- formated_production$CD[which(formated_production$CD$Unique.Well.ID == 
                                              uwi),]

# Find peak oil month for this uwi
uwi.pm <- matrix[["Oil.Peak.Month"]][which(matrix$eachUWI==uwi)]

# Refine production datatable to only include data after peak oil rate
singleWell<-singleWell.all[-c(1:(uwi.pm-1)),]

# Create time vector starting at 0 for peak oil month
synth.time <- (singleWell$CD.Month - uwi.pm) * 30.4 # Converted to daily

# Let's apply the MH.synthesize function in an apply call to execute across each row
# of the parameters dataframe
oil.synth<-apply(MH.sampled.parameters,1,MH.synthesize,q.abd = o.abd, t = synth.time)

# Print first object of oil.synth to show mcIteration object
print(oil.synth[[1]])
## $Iteration
## [1] "qi=149.56 b=1.38 di=0.91 dlim=0.07"
## 
## $Parameters
##          qi           b          di        dlim 
## 149.5566092   1.3771858   0.9056401   0.0700000 
## 
## $Synth
##  [1] 149.55661 125.68583 109.24546  97.13914  87.80085  80.34768  74.24143
##  [8]  69.13402  64.78971  61.04274  57.77297  54.89099  52.32883  50.03381
## [15]  47.96446  46.08761  44.37643  42.80894  41.36697

Each mcIteration list, defining a Monte Carlo iteration and historical synthesis, can now be evaluated against the actual data to search for an appropriate model.

Loss Functions

How do we compare the synthesis forecasts against our actuals and determine what is a good fit and what is a poor fit?

To compare each synthesis to the historical production data observed, we need to use functions that compare predicted points to actuals for each model. These functions are commonly referred to as loss functions. The most common loss function is the L2 loss function (insert link here?), which is popular due to its easy implementation, especially in linear models. However, the L2 loss is not as useful when comparing data with significant outliers, such as oil and gas production data, because it assigns too much weight to these outliers and rejects models that do not match them. We will need to look at more robust loss functions to find models that match the historical data as an engineer would.

Soft L1

Soft L1 loss is a robust, continuous variation of the common Huber Loss that gives less weight to outliers than a L2 loss, making it useful for fitting time series production data. Our Soft L1 definition will return the individual loss of each point and an average cost for all residuals input.


To apply the loss equation we’ll define a function to apply the Soft L1 loss to the residuals of each model match. The function will include an option to apply the loss function against the log residuals, an option that we’ll take advantage of due to the non-linear nature of production data. The function will populate the IndvCost and Cost slots of each mcIteration object, and once the function is defined, we can use these slots to visualize each model’s fit.

## $Iteration
## [1] "qi=149.56 b=1.38 di=0.91 dlim=0.07"
## 
## $Parameters
##          qi           b          di        dlim 
## 149.5566092   1.3771858   0.9056401   0.0700000 
## 
## $Synth
##  [1] 149.55661 125.68583 109.24546  97.13914  87.80085  80.34768  74.24143
##  [8]  69.13402  64.78971  61.04274  57.77297  54.89099  52.32883  50.03381
## [15]  47.96446  46.08761  44.37643  42.80894  41.36697
## 
## $IndvCost
##  [1] 24.8376599 19.5115191 14.6114603 10.3052483  8.8590019  7.5320440
##  [7] 10.1208577  6.8127360  5.2085792  3.8434490  2.0350757  2.7126292
## [13]  0.4156658  2.5246885  2.2184544  2.2649887  3.7291126  3.6118306
## [19]  3.6339453
## 
## $Cost
## [1] 7.094155

With cost calculated using the Soft L1 loss function applied to the standard and log residuals for every model iteration, let’s use the average cost to find our “best fit” model out of the 10,000 iterations. We’ll plot the model against the actual oil rate, coloured by individual cost for both residual method to look at how each handles outliers.

# Use lapply and do.call to combine the average cost of every model object 
# into a single dataframe
oil.cost <- do.call(rbind,lapply(oil.synth,function(x){
  return(x$Cost)
}))

# Repeat for log residuals
oil.cost.log <- do.call(rbind,lapply(oil.synth.log,function(x){
  return(x$Cost)
}))

# Find index of best fit model using log soft_L1 residuals
oil.bf.log <- which.min(oil.cost.log)

# Plot this "best fit" against actuals and colour by standard and log residual cost
# To plot, we'll utilize some custom plotly functions defined in plotResources external file
oil.synth.plot <- plotly.multimodel.indvcostcolor(actuals.x = singleWell[["CD.Month"]],
                                                  actuals.y = singleWell[["Oil.Rate"]],
                                                  iterationObject = oil.synth[oil.bf.log],
                                                  scales = "semi-log")
oil.synth.plot <- layout(oil.synth.plot, xaxis = list(title = "Cal-Day Month"),
                         yaxis = list(title = "Oil Rate (bbls/d)"), 
                         title = "Soft L1 Cost of Residuals")

oil.synth.plot.log <- plotly.multimodel.indvcostcolor(actuals.x = singleWell[["CD.Month"]],
                                                  actuals.y = singleWell[["Oil.Rate"]],
                                                  iterationObject = oil.synth.log[oil.bf.log],
                                                  scales = "semi-log")

oil.synth.plot.log <- layout(oil.synth.plot.log, xaxis = list(title = "Cal-Day Month"),
                         yaxis = list(title = "Oil Rate (bbls/d)"), 
                         title = "Soft L1 Cost of Log Residuals")
# Combine into a subplot
combined<-subplot(oil_prod,gas_prod,nrows = 2,shareY = TRUE,titleX = FALSE,titleY = TRUE)
combined<-layout(combined,showlegend=FALSE,title="Oil & Gas Production",xaxis2=list(title="Cal-Day Month"))

combined.cost <- subplot(oil.synth.plot,oil.synth.plot.log,nrows = 1,titleX = TRUE,titleY=FALSE)
combined.cost <- layout(combined.cost,title = "Soft L1 Cost of Absolute vs. Log Residuals",
                        yaxis=list(title="Oil Rate (bbls/d)"))
#IMPROVE TOOLTIP???
hide_colorbar(combined.cost)


Comparing the absolute residual cost in the left plot against the log residual cost in the right plot, there is similar relative spread in the residuals and both cost the peak rate similarily, but the log residuals does a better job at costing the sub peak at month 9. We will carry log residual cost through the remainder of the CAST methology as it capture the non-linearity of the production data better. One important thing to notice before moving on is that the Soft L1 function gives a high cost to the peak oil rate and does not match it perfectly as a result. This may be concerning when compared to traditional DCA matches, but given that early time production data can be far from steady-state flowing behaviour, it is a useful feature to maintain.

Using the log residuals and the Soft L1 cost function, let’s find the top 100 models from our initial 10,000 iterations and see how they describe the ranges of our actual oil rates. We’ll colour them by average cost to compare the forecasts.

Top 10% of Models Using Soft L1 Loss

Top 10% of Models Using Soft L1 Loss


We can see that the top 100 forecasts do a good job of covering the range of actual oil rates, with the lowest cost forecasts tightly grouped around the actuals. We’ve shown we can use Monte Carlo sampling paired with a loss function to match historical production! Before moving on let’s note that the majority of the forecasts are under-predicting the intial rate. This is primarily driven by the high cost of the peak oil rate using the Soft L1, but could also be improved through our input distributions. This is something we can improve upon later.

PDCA Loss

Explain this loss function, where it’s from, and show a comparison of application to the Soft L1 loss

The Soft L1 loss function does a decent job at finding a best fit model, but is there a better loss function that can be used? Let’s look at a more complete loss function defined in SPE-174784-PA (ADD LINK HERE AS REFERENCE) by Fulford et al. takes a similar approach, but .. EXPLAIN HERE

Just as we did with the Soft L1, let’s use the PDCA loss function to find the top 100 forecasts from the same 10,000 iterations. We’ll see the minimum mean and minimum standard deviation of epsilon to 0, assuming a perfect model. Again we’ll colour by average cost and compare to the Soft L1 method.

Comparing Top 10% of Models Using Soft L1 and PDCA LossComparing Top 10% of Models Using Soft L1 and PDCA Loss

Comparing Top 10% of Models Using Soft L1 and PDCA Loss

Summarize findings in these two charts and how we’re going to carry PDCA loss going forward in the rest of the evaluation / process as it provides a better probabilistic representation

Cost Evaluation

This has already been done. Still need?

Best Fit

Show best fit using either loss function and explain how this may be a good fit for wells with good production history, we still want to create a probabilistic outcome for each well to capture forecast uncertainty

Show top 10% of functions, etc. using either loss function, and show that the Monte Carlo method returns adequate forecasts for good prod history, but not short history wells.. What’s next?? MCMC

Markov Chain Monte Carlo (MCMC)

Explain concept or link above if put there.

Burn-In

Show MC method as burn-in to determine best fit forecast to apply PDCA loss function to

Jumping function and distributions

Results

Probabilistic forecasts

Creation of P10, P50, Avg, and P90 forecasts

 




Darcy Redpath
Petronas Canada